An Effective Approach for Imbalanced Classification: Unevenly Balanced Bagging
نویسندگان
چکیده
Learning from imbalanced data is an important problem in data mining research. Much research has addressed the problem of imbalanced data by using sampling methods to generate an equally balanced training set to improve the performance of the prediction models, but it is unclear what ratio of class distribution is best for training a prediction model. Bagging is one of the most popular and effective ensemble learning methods for improving the performance of prediction models; however, there is a major drawback on extremely imbalanced data-sets. It is unclear under which conditions bagging is outperformed by other sampling schemes in terms of imbalanced classification. These issues motivate us to propose a novel approach, unevenly balanced bagging (UBagging), to boost the performance of the prediction model for imbalanced binary classification. Our experimental results demonstrate that UBagging is effective and statistically significantly superior to single learner decision trees J48 (SingleJ48), bagging, and equally balanced bagging (BBagging) on 32 imbalanced data-sets.
منابع مشابه
Roughly Balanced Bagging for Imbalanced Data
Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distributions. In our new sampling method “Roughly Balanced Bagging” (RB Bagging), the number of samples in the largest and smallest classes are different, but they are effectively balanced when averaged over all subsets, wh...
متن کاملAn Effective Method for Imbalanced Time Series Classification: Hybrid Sampling
Most traditional supervised classification learning algorithms are ineffective for highly imbalanced time series classification, which has received considerably less attention than imbalanced data problems in data mining and machine learning research. Bagging is one of the most effective ensemble learning methods, yet it has drawbacks on highly imbalanced data. Sampling methods are considered t...
متن کاملAn Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets
Most classifiers work well when the class distribution in the response variable of the dataset is well balanced. Problems arise when the dataset is imbalanced. This paper applied four methods: Oversampling, Undersampling, Bagging and Boosting in handling imbalanced datasets. The cardiac surgery dataset has a binary response variable (1=Died, 0=Alive). The sample size is 4976 cases with 4.2% (Di...
متن کاملExtending Bagging for Imbalanced Data
Various modifications of bagging for class imbalanced data are discussed. An experimental comparison of known bagging modifications shows that integrating with undersampling is more powerful than oversampling. We introduce Local-and-Over-All Balanced bagging where probability of sampling an example is tuned according to the class distribution inside its neighbourhood. Experiments indicate that ...
متن کاملNeighbourhood sampling in bagging for imbalanced data
Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex...
متن کامل